Identifying the L1 of non-native writers: the CMU-Haifa system

نویسندگان

  • Yulia Tsvetkov
  • Naama Twitto
  • Nathan Schneider
  • Noam Ordan
  • Manaal Faruqui
  • Victor Chahuneau
  • Shuly Wintner
  • Chris Dyer
چکیده

We show that it is possible to learn to identify, with high accuracy, the native language of English test takers from the content of the essays they write. Our method uses standard text classification techniques based on multiclass logistic regression, combining individually weak indicators to predict the most probable native language from a set of 11 possibilities. We describe the various features used for classification, as well as the settings of the classifier that yielded the highest accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Move-based investigation of appraisal in the introduction section of Applied Linguistics research articles: Similarities and differences between L1 and L2 English texts

Recent research has shown that academic writing is not ‘author-evacuated’ but, rather, carries a representation of the writers’ identity. One way through which writers project their identity in academic writing is stance-taking toward propositions advanced in the text. Appropriate stance-taking has proved to be challenging for novice writers of Research Articles (RAs), especially those writing ...

متن کامل

A Comparative Analysis of Self-Mentions in Applied Linguistics PhD Dissertations Written by Native and Non-Native English Writers

The purpose of the present study was to compare the PhD dissertations written by native and nonnative English writers in the field of Applied Linguistics with regard to the use of self-mentions. To this end, 40 Applied Linguistics PhD dissertations (20 written by native English writers and 20 by non-native English writers), were selected randomly among academic texts written in 2007-2017. The p...

متن کامل

Metadiscourse Elements in English Research Articles Written by Native English and Non-native Iranian Writers in Applied Linguistics and Civil Engineering

This study investigated metadiscourse and its subcategories in English research articles (RAs) written by nonnative (Iranian) and native English writers from the two disciplines of applied linguistics and civil engineering. The study aimed at seeing whether language and discipline influenced the frequency of occurrence of metadiscourse elements in research articles. To this end, a sample of 120...

متن کامل

Clause Complexity in Applied Linguistics Research Article Abstracts by Native and Non-Native English Writers: Taxis, Expansion and Projection

Halliday’s Systemic Functional Linguistics (SFL) has stood the test of time as a model of text analysis. The present literature contains a plethora of studies that while taking the ‘clause’ as a unit of analysis have put into investigation the metafunctions in research articles of a single field of study or those of various fields in comparison. Although ‘clause complex’ is another unit of SF a...

متن کامل

Hedges and Boosters in Academic Writing: Native vs. Non-Native Research Articles in Applied Linguistics and Engineering

The expression of doubt and certainty is crucial in academic writing where the authors have to distinguish opinion from fact and evaluate their assertions in acceptable and persuasive ways. Hedges and boosters are two strategies used for this purpose. Despite their importance in academic writing, we know little about how they are used in different disciplines and genres and how foreign language...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013